75 research outputs found

    A cell set for self-timed design using actel FPGAs

    Get PDF
    technical reportAsynchronous or self-timed systems that do not rely on a global clock to keep system components synchronized can offer significant advantages over traditional clocked circuits in a variety of applications. However, these systems require that suitable self-timed circuit primitives are available for building the system. This report describes a cell set designed for building self-timed circuits and systems using Actel field programmable gate arrays (FPGAs). The cells use a two-phase transition signalling protocol for control signals and a bundled protocol for data signals. This library of macro cells is designed to be used with the Workmen tool suite from VIEWlogic and the Action Logic System (ALS) from Actel

    Low latency self-timed flow-through FIFOs

    Get PDF
    Journal ArticleSelf-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in this type of FIFO as the communication required to send new data to the FIFO is local to only the first element of the FIFO. Circuit density can also be high because the control overhead is very small. However, because data must travel through every cell in the FIFO when moving from input to output, latencies can be long. This paper describes some alternative approaches to building self-timed flow-through FIFOs that reduce the latency while retaining the high throughput and relative simplicity of a flow-through design. Five designs are presented: a standard linear pow-through FIFO in which the data pass through every latch in the FIFO, a parallel FIFO in which data are delivered in turn to a set of parallel flow-through FIFOs, a tree FIFO in which data are fanned out into a tree of simple FIFOs, a square FIFO in which the tree is organized as a square array to achieve better layout packing, and a folded FIFO in which data will try to skip as many of the empty FIFO cells as possible to find the shortest path to the output

    Reduced latency self-timed FIFO circuits

    Get PDF
    technical reportSelf-timed flow-through FIFOs are constructed easily using only a single C-element as control for each stage of the FIFO. Throughput can be very high in FIFOs of this type because new data can be sent to the FIFO after communicating locally with only the first element of the FIFO. Therefore the throughput is independent of the depth of the FIFO. Density can also be high because the control overhead is very small. However, because data must travel through every cell in the FIFO when moving from input to output, latencies can be long. This report describes some alternative approaches to building self-timed flow-through FIFOs that reduce the latency while retaining the high throughput and relative simplicity of a flow-through design. Five designs are presented: a standard linear flow-through FIFO in which the data pass through every latch in the FIFO, a parallel FIFO in which data are delivered in turn to a set of parallel flow-through FIFOs, a tree FIFO in which data are fanned out into a tree of simple FIFOs, a square FIFO in which the tree is organized as a square array to achieve better layout packing, and an arbited FIFO in which data will try to skip as many of the empty FIFO cells as possible to find the shortest path to the output

    The NSR processor

    Get PDF
    Journal ArticleThe NSR (Non-Synchronous RISC) processor is a general-purpose computer structured (IS U collection of self-timed blocks that operate concurrently and communicate over bundled data channels in the style of micropipelines [3, 16]. These blocks correspond to standard synchronous pipeline stages such us Instruction Fetch, Instruction Decode, Execute, Memory and register File, but each operates concurrently as a separate self-timed process. In addition to being internally self-timed, the units are decoupled through self-timed FIFO queues between each of the units which allows U high degree of overlap in instruction execution. Branches, jumps, and memory accesses are also decoupled through the use of additional FIFO queues which can hide the execution latency of these instructions. A prototype implementation of the NSR processor has been constructed using Actel FPGAs (Field Programmable Gate Arrays)

    A correctness criterion for asynchronous circuit validation and optimization

    Get PDF
    technical reportWe propose a new relation C. called strong conformance in the context of Dill's trace theory, and define B Q A to be true exactly when B conforms to A and the success set of B contains the success set of A. When B C. A, module B operated in module A's maximal environment AM (i.e. B || AM) exhibits all the traces that A \\ AM exhibits. In addition, if A has a success trace x, B can have additional success traces of the form xi?* where i is an input and a is the alphabet of the trace structure. This means that B can have additional capabilities that A does not. We show that strong conformance is more useful than conformance (defined by Dill) in detecting certain errors in asynchronous circuits. Strong conformance also helps justify circuit optimization rules that replace a component A by another component B that may have extra capabilities (e.g. can accept more inputs). The structural operators compose, rename, and hide of Dill's trace theory are monotonic with respect to strong conformance. Experiments using a modified version of Dill's trace theory verifier are presented

    A gallium arsenide mutual exclusion element

    Get PDF
    technical reportA mutual exclusion element is a key component in building asynchronous and self-timed circuits. As part of our effort to design high performance self-timed circuits, we have designed a mutual exclusion element in gallium arsenide. This circuit has been fabricated in a 1.2? process and tested. A test circuit using matched delay lines was used to verify mat the circuit operates correctly when input requests arrive simultaneously

    Editorial asynchronous architecture

    Get PDF
    Journal ArticleAsynchronous design is enjoying a worldwide resurgence of interest following several decades in obscurity. Many of the early computers employed asynchronous design techniques, but since the mid 1970s almost all digital design has been based around the use of a central clock. The clock simplifies most aspects of design and offers methodologies which are straightforward and easy to automate. These benefits have helped digital engineers to take advantage of the ever-expanding resource at their disposal, while keeping design costs under control. Comprehensive CAD systems used pervasively in industry, and targeted specifically at synchronous design styles, are one way of achieving this

    Fred: an architecture for a self-timed decoupled computer

    Get PDF
    Journal ArticleDecoupled computer architectures provide an effective means of exploiting instruction level parallelism. Selftimed micropipeline systems are inherently decoupled due to the elastic nature of the basic FIFO structure, and may be ideally suited for constructing decoupled computer architectures. Fred is a self-timed decoupled, pipelined computer architecture based on micropipelines. We present the architecture of Fred, with specific details on a micropipelined implementation that includes support for multiple functional units and out-of-order instruction completion due to the self-timed decoupling

    Estimating performance of an ray- tracing ASIC design

    Get PDF
    Journal ArticleRecursive ray tracing is a powerful rendering technique used to compute realistic images by simulating the global light transport in a scene. Algorithmic improvements and FPGA-based hardware implementations of ray tracing have demonstrated realtime performance but hardware that achieves performance levels comparable to commodity rasterization graphics chips is still not available. This paper describes the architecture and ASIC implementations of the DRPU design (Dynamic Ray Processing Unit) that closes this performance gap. The DRPU supports fully programmable shading and most kinds of dynamic scenes and thus provides similar capabilities as current GPUs. It achieves high efficiency due to SIMD processing of floating point vectors, massive multithreading, synchronous execution of packets of threads, and careful management of caches for scene data. To support dynamic scenes B-KD trees are used as spatial index structures that are processed by a custom traversal and intersection unit and modified by an Update Processor on scene changes

    Peephole optimization of asynchronous macromodule networks

    Get PDF
    Journal ArticleAbstract- Most high-level synthesis tools for asynchronous circuits take descriptions in concurrent hardware description languages and generate networks of macromodules or handshake components. In this paper, we propose a peephole optimizer for these networks. Our peephole optimizer first deduces an equivalent blackbox behavior for the network using Dill's tracetheoretic parallel composition operator. It then applies a new procedure called burst-mode reduction to obtain burst-mode machines from the deduced behavior. In a significant number of examples, our optimizer achieves gate-count improvements by a factor of five, and speed (cycle-time) improvements by a factor of two. Burst-mode reduction can be applied to any macromodule network that is delay insensitive as well as deterministic. A significant number of asynchronous circuits, especially those generated by asynchronous high-level synthesis tools, fall into this class, thus making our procedure widely applicable
    • …
    corecore